Skip to content

Canonicalize volatile system prompt headers in OpenAI paths#528

Open
Thump604 wants to merge 1 commit into
waybarrios:mainfrom
Thump604:604/issue-524-prompt-canonicalization
Open

Canonicalize volatile system prompt headers in OpenAI paths#528
Thump604 wants to merge 1 commit into
waybarrios:mainfrom
Thump604:604/issue-524-prompt-canonicalization

Conversation

@Thump604
Copy link
Copy Markdown
Collaborator

Refs #524.

Summary

  • Add a small static system-prompt canonicalization helper with the currently validated x-anthropic-billing-header stripper.
  • Apply it to prepared system-role messages in Chat Completions and Responses paths before engine execution.
  • Cover the helper, Chat Completions preparation, Responses preparation, and existing Anthropic adapter behavior.

Local repro / observed behavior

On current waybarrios/vllm-mlx main (f068991 when this branch was cut), the Anthropic Messages adapter removes x-anthropic-billing-header: from request.system in vllm_mlx/api/anthropic_adapter.py::anthropic_to_openai, but the OpenAI server paths did not apply the same canonicalization:

  • vllm_mlx/server.py::_prepare_chat_messages
  • vllm_mlx/server.py::_prepare_responses_request

A Chat Completions system message or Responses instructions value containing:

x-anthropic-billing-header: account=abc; cch=rotating-hash

was still present in the prepared system message sent to the engine.

Expected behavior

The validated billing-header line is non-semantic request metadata and should be removed from system-role text before engine execution. User-role content with the same text is preserved, and user-visible timestamp text is not stripped.

Minimal patch shape

  • New vllm_mlx/api/prompt_canonicalize.py module with a static stripper list and canonicalize_system_prompt().
  • New canonicalize_system_messages() helper that copies only changed system messages.
  • Call the helper after existing message normalization in Chat Completions and after Responses input conversion.
  • Do not add runtime registration APIs or speculative strippers.

Explicitly not claimed

  • This does not add timestamp, MCP UUID, or session-ID strippers.
  • This does not change SimpleEngine system-prefix KV-cache logic from feat: extend system-prompt KV cache to pure-LLM stream_chat path #523.
  • This does not include new TTFT or cache-hit-rate benchmark results.
  • This does not change media extraction, tool parsing, sampling, or decode controls.

Verification

AI_RUNTIME_BYPASS_SAFETY_GATE=1 PYTHONPATH=/opt/ai-runtime/worktrees/vllm-mlx/issue-524-prompt-canonicalization /opt/ai-runtime/venv-live/bin/python -m pytest tests/test_prompt_canonicalize.py tests/test_responses_api.py tests/test_anthropic_adapter.py tests/test_server.py::TestPromptCanonicalization -q
# 74 passed

uvx ruff check vllm_mlx/api/prompt_canonicalize.py vllm_mlx/server.py tests/test_prompt_canonicalize.py tests/test_server.py tests/test_responses_api.py
# All checks passed

/opt/ai-runtime/venv-live/bin/python -m black --check --target-version py312 vllm_mlx/api/prompt_canonicalize.py vllm_mlx/server.py tests/test_prompt_canonicalize.py tests/test_server.py tests/test_responses_api.py
# 5 files would be left unchanged

git diff --check
# clean

Copy link
Copy Markdown
Collaborator

@janhilgard janhilgard left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Clean implementation. The prompt_canonicalize.py module is well-isolated with an extensible _STRIPPERS tuple, and canonicalize_system_messages() correctly avoids mutation (copies only changed messages, returns the original list when nothing changes).

Review

  • Regex (?im)^x-anthropic-billing-header:[^\n]*(?:\n|$) — multiline + case-insensitive + anchored. Correctly strips the full line including trailing newline.
  • Only targets system role — user content is preserved (explicitly verified by the chat completion test).
  • None passthrough is correct.
  • Placement in server.py is right: after _normalize_messages() but before media extraction.
  • Tests cover the unit helper (strip, idempotent, None/empty), Chat Completions path, and Responses path.

Minor note

The Anthropic adapter in anthropic_adapter.py has its own inline regex (re.sub(r"x-anthropic-billing-header:[^\n]*\n?", "", system_text)) without (?im) flags or ^ anchor. Not blocking, but a future cleanup could have the adapter call canonicalize_system_prompt() instead of duplicating the pattern.

CI 9/9. LGTM.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants